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The complete amino acid sequences of hepatitis C virus (HCV) core (residues 1 to 115) and putative matrix 
(residues 116 to 190) proteins were synthesized as 18-residue-long peptides with an 8-amino-add overlap. The 
peptides were assayed with 50 human serum samples with antibodies to HCV (anti-HCV) and 46 serum samples 
without anti-HCV, as determined by several commercial assays. Immunodominant regions were defined within 
residues 1 to 18, 11 to 28, 21 to 38, 51 to 68, and 101 to 118. The peptides that covered these regions were 
recognized by 40 of 50 (80%), 42 of 50 (84%), 36 of 50 (72%), 34 of 48 (68%), and 36 of 48 (72%) of the 
anti-HCV positive serum samples, respectively. Two anti-HCV negative serum samples were each repeatedly 
reactive with one peptide, but both were found to be negative by confirmatory anti-HCV assays. Four serum 
samples that were confirmed to be positive for anti-HCV in commercial assays did not recognize any of the 
peptides that cover the HCV core-matrix regions. Ninety-two percent of anti-HCV -positive serum samples 
reacted with a combination of peptides covering residues 1 to 18 and 11 to 28. Testing of peptides that contain 
the reported genotypic variations of the HCV core within the regions at residues 1 to 18, 51 to 68, and 101 to 
118 showed that a change from Thr-110 to Asn-110 decreased the reactivities of eight serum samples. In 
conclusion, we found that human antibodies to the HCV core-matrix protein(s) are mainly directed to linear 
determinants and can easily be reproduced by using short synthetic peptides. We also found that such 
antibodies develop in more than 90% of HCV- infected people. 


The hepatitis C virus (HCV), which was first described in 
1989 (4, 12), is a member of the flaviviridae (5, 11, 19, 27). 
The genome appears to encode for three structural proteins, 
the core, putative matrix, and envelope proteins, and possi- 
bly six nonstructural (NS) proteins, NS1, NS2, NS3, NS4a, 
NS4b, and NS5 (23). HCV is prevalent at rates of between 
<0.36 and 1.0% among blood donors in industrialized coun- 
tries (6, 10, 13, 20) and at rates of about 2.5 to 6.4% among 
populations in developing countries (1, 26). At present, the 
most commonly used assays in Europe and the United States 
use the recombinant C- 100-3 construct that covers parts of 
the NS4 protein (4), although recently, new assays that 
contain either long synthetic peptides or recombinant pep- 
tides that cover both structural and nonstructural HCV 
products have been introduced (8). The problems with the 
early assays have been unsatisfactory sensitivities and spec- 
ificities (6), although the new assays seem to have both 
improved sensitivities and specificities, and therefore, the 
accuracies of the assays are improved (2). We were inter- 
ested in characterizing further the immune recognition of 
HCV and to identify the immunodominant regions within the 
HCV core protein by using short synthetic peptides. 

MATERIALS AND METHODS 

Patient sera. A total of 96 serum samples were selected 
from consecutive serum samples sent to the National Bac- 
teriological Laboratoiy from March 1991 to May 1991. They 
were selected because they had been assayed for anti-HCV 
by commercial assays (first- and second-generation tests 
[Abbott Laboratories, North Chicago, 111.], Hepanostica C 
[Organon Teknika, Boxtel, The Netherlands], or second- 
generation tests [Ortho Diagnostic Systems, Raritan, N.J.]). 


* Corresponding author. 


On testing, 50 of the serum samples were found to be 
reactive and 46 were found to be negative by the commercial 
assays. The majority (40 of 50) of the anti-HCV-positive sera 
were obtained from intravenous drug users. 

HCV core and putative matrix amino add sequence. The 
amino acid sequences of the HCV core (residues 1 to 115) 



Residues of HCV capsidprotein 

FIG. 1. Sum of OD^ values for positive reactions with 44 
anti-HCV-positive sera (a). Only sera that gave an OD^ that 
exceeded the mean ± 7 standard deviations for negative sera are 
included, (b) Mean ± standard deviation OD^ for negative sera for 
the peptides covering the HCV core and matrix proteins; data are 
from two different runs; mean 1 (n = 46 serum samples); mean 2 {n 
= 12 serum samples). 
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TABLE 1. Epitope mapping of 18-amino-acid peptides with an 8-amino-acid overlap with 50 human serum samples found to be positive 

for anti-HCV by C- 100-3 or multiple antigen assays® 
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Total no. of serum 40 42 

samples that were 
reactive 


36 20 20 34 25 1 7 16 


36 


5 0 3 2 9 7 4 


* The cutoff was set at the means ± 7 standard deviations for negative serum samples. Sera were sorted according to the recognition pattern of the N-terminal 
portion at residues 1 to 58 only to increase readability. 

* The core protein was from residues 1 to 115; the matrix protein was from residues 116 to 190. +, OD^ of <1.0; ++, OD^ of >1.0; + + +, OD^ of >2.0; 
OD 40 S values were determined by peptide enzyme immunoassay. 
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TABLE 2. Reactivities of combinations of peptides from three 
immunodominant regions within the HCV core in detecting 
50 anti-HCV-positive sera" 


HCV core pep- 
tide combination 
(residues) 

No. 
(%) of 
sera 
reactive 

OD405 
(range) of 
reactive sera 

OD4Q3 (mean ± 
SD) of anti- 
HCV-negative 
sera 

1-18 and 11-28 

46(92) 

0.548-2.600 

0.087 ± 0.019 

1-18 and 51-68 

44(88) 

0.220-2.600 

0.074 ± 0.016 


a OD values that exceeded the mean ± 7 standard deviations were regarded 
as reactive. The peptide combinations were assayed at various concentrations 
(0.01 to 0.001 mg/ml), and the concentrations that gave the highest sensitivity 
were used to obtain the results presented here. 


and the putative matrix (residues 116 to 190) proteins were 
obtained by comparing previously sequenced genomes (5, 
11, 18, 19, 23-25). The sequence from a U.S. strain (24) was 
selected and synthesized as 18-residue peptides with an 
8-residue overlap between each peptide. 

Also, a set of five of peptides within the immunodominant 
regions that are reported to vary among strains (residues 1 to 
18, 51 to 68, and 101 to 118) (5, 11, 18, 19, 23, 24) was 
synthesized. The peptides were as follows: one peptide with 
an Ue-4 instead of an Asn-4, one peptide with a Lys-6 instead 
of an Arg-6 and an Asn-11 instead of a Thr-11, one peptide 
with a Trp-61 instead of an Arg-61, one peptide with a Val-68 
instead of an Ala-68, and one peptide with an Asn-110 
instead of a Thr-110. 

A total of 23 different HCV core peptide analogs were 
synthesized, and all except the 5 analogs containing strain- 
specific amino acid substitutions were assayed by using the 
96 serum samples. 

Peptide synthesis. Peptides were simultaneously synthe- 
sized by a slightly modified version of a recently described 
method (21). In brief, the peptides were synthesized on 30 
mg of Polyhipe resin (NovaSyn PR 500; NovaBiochem, 
Lauflingen, Switzerland) held in polypropylene bags. All 
amino acids were prepared in stock solutions and were 
dispensed to the appropriate peptide bag at each coupling 
step. Throughout the synthesis, iV^V-dimethylformamide 
was used as the sole solvent for the coupling and washing 
steps. The final peptides were then cleaved and deprotected 


by using trifluoroacetic acid containing the appropriate scav- 
engers (21). All peptides were analyzed by high-pressure 
liquid chromatography. Peptides were lyophilized and then 
dissolved in distilled water at a concentration of 1 mg/ml. 

Peptide enzyme immunoassays. Peptides were coated onto 
microtiter plates (Maxisorp 96F Certificate; Nunc, Roskilde, 
Denmark) at 0.001 mg per well overnight at 4°C in NaHC0 3 
(pH 9.6). Before testing, the plates were blocked by the 
addition of 2 % bovine serum albumin for 2 h at room 
temperature. Sera were then added at a 1:100 dilution and 
were incubated for 45 min at 37°C. Serum-bound immuno- 
globulin G was then detected by adding alkaline phos- 
phatase-labeled goat anti-human immunoglobulin G (A-3150; 
Sigma Chemical Co., St. Louis, Mo.) diluted 1:1,500. 

Dilutions of sera and conjugate were prepared in phos- 
phate-buffered saline containing 2% goat serum, 2% bovine 
serum albumin, and 0.05% Tween 20. 

In inhibition experiments, the enzyme immunoassay was 
performed as described above by adding 0.01 to 0.03 mg of 
the respective peptide at the same time that sera were added 
to the coated wells. 

Since unspecific reactions have been one of the major 
problems with commercial anti-HCV assays, we wanted to 
avoid this in our assays, and therefore, to obtain a high 
specificity, only reactions with absorbances that exceeded 
the means ± 7 standard deviations of the absorbances of 
anti-HCV-negative sera were regarded as reactive. 

Hydropathic index of the HCV core-matrix protein se- 
quence. The hydropathic index of the HCV core-matrix 
protein sequence (residues 1 to 190) was predicted and 
plotted by the method described by Kyte and Dolittle (14) by 
using Micro Genie software on a personal computer (Inter- 
national Business Machines). 

RESULTS 

Mapping of antigenic regions within HCV core-matrix 
proteins. Results from the testing of 50 anti-HCV positive 
serum samples with the 18 HCV core-matrix peptides are 
given in Fig. la and Table 1, and the mean optical density at 
405 nm (OD^) of the anti-HCV negative sera is given in Fig. 
lb. The values in Fig. la are given as the cumulated 
absorbances (sums of the OD 450 s) of all positive reactions for 


a 


b c 



OD 405 with peptide 1-18 (original sequence) 

FIG. 2. Comparison of two runs with 44 anti-HCV-positive and 4 anti-HCV-negative sera with the original peptide at residues 1 to 18 (a). 
The same serum samples were used to compare the recognition of the original peptide at residues 1 to 18 with the recognition of two peptides 
covering residues 1 to 18 containing the strain-specific substitutions Asn-4 to IIe-4 (N to I) (b) and Arg-6 to Lys-6 and Thr-11 to Asn-11 (R 
to K, and T to N, respectively) (c). Values are given as the OD^, and correlation is given as R 2 values. 
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a b c d 



OD of peptide 51-68 (original sequence) OD 405 of peptide 

101-118 (original 
sequence) 

FIG. 3. Comparison of two runs with 44 anti-HCV-positive and 4 anti-HCV-negative sera with the original peptide at residues 51 to 68 (a). 
The same sera were used to compare the recognition of the original peptides at residues 51 to 68 and 101 to 118 with the recognition of three 
peptides covering residues 51 to 68 and 101 to 118 containing the strain-specific substitutions Arg-61 to Trp-61 (R to W) (b), Ala-68 to Val-68 
(A to V) (c), and Thr-110 to Asn-110 (T to N) (d). Values are given as the OD^, and correlation is given as R 2 values. 


each peptide obtained with the sera that were positive by 
commercial anti-HCV assays. The major antigenic regions 
were covered by peptides 1 to 18, 11 to 28, 21 to 38, 51 to 68, 
and 101 to 118. The regions at residues 1 to 18, 11 to 28, and 
21 to 38 seemed to contain distinct antigenic regions, since 
some sera were found to react with each of these peptides 
and not with the adjacent overlapping peptides (Table 1; for 
example, sera 5087, 3750, and 3733). Peptides 51 to 68 and 61 
to 68 were also found to have reactivities, suggesting two 
distinct sites. Most serum samples showed combined reac- 
tivities to regions 1 to 38, 51 to 68, and 101 to 118; only two 
serum samples recognized fewer than three peptides. The 
peptide reactivities of the anti-HCV-positive sera could be 
inhibited by more than 50% by adding the specific peptide, 
suggesting specific reactions. 

None of the 18 HCV core-matrix peptides was detected by 
all 50 anti-HCV-positive serum samples. Most sera reacted 
with the peptides at residues 1 to 18 and 11 to 28, which were 
detected by 40 and 42 serum samples, respectively. Thirty- 
six serum samples were detected by the peptide at residues 
21 to 38, 34 serum samples were detected by the peptide at 
residues 51 to 68, and 36 serum samples were detected by the 
peptide at residues 101 to 118. The negative sera had a higher 
background reaction when they were initially tested with the 
peptides at residues 11 to 28 and 141 to 158 (Fig. lb). Upon 
retesting reactivities were found to be reduced to a level 
similar to those of the other peptides. 

Of the 46 anti-HCV negative serum samples, 2 serum 
samples were each found to react reproducibly with one 
peptide of the HCV core peptides. These two serum samples 
were found to be reproducibly negative by using several 
second-generation anti-HCV assays, even though the pep- 
tide reactivities of both serum samples to the HCV core 
could be inhibited by adding the specific peptide to the 
solution. These reactions were not regarded as indicative of 
previous exposure to HCV. Four of the 50 anti-HCV- 
positive serum samples were not reactive with any of the 
HCV core-matrix peptides. All four were found to react with 
nonstructural recombinant proteins in the Supplemental 
Assay (Abbott); and by the same assay and with the struc- 
tural recombinant proteins, one gave strong reactivity, two 
gave values just above the cutoff, and one was nonreactive. 

Testing of a combination of HCV core-matrix peptides. A 


combination of peptides covering residues 1 to 18 and 11 to 
28 for coating showed a complementary function and in- 
creased the sensitivity to 92% (Table 2). The inclusion of the 
peptide at residues 51 to 68 did not add to the sensitivity or 
to the positive-to-negative ratio when assayed with our sera. 

Testing of genotype-specific HCV core-matrix peptides. 
Testing of the genotype-specific peptides showed that sera 
with low-level reactivities (OD^, <0.5) almost exclusively 
changed their reaction patterns when they were assayed with 
these peptides (Fig. 2 and 3). The reactivity to the N-termi- 
nal portion of the HCV core protein showed a stable 
cross-reaction between peptides from HCV strains of differ- 
ent genotypes (Fig. 2). 

One serum sample became reactive to the peptide at 
residues 51 to 68 if residue Ala-68 was substituted by Val-68 
(Fig. 3c). The amino acid-specific change from Thr-110 to 
Asn-110 was found to be the most sensitive substitution (Fig. 
3d), and eight reactive serum samples became negative. The 
reactivities recorded with the substitution peptides were 
significantly correlated to the reactivities with the original 
peptide (P < 0.001 for all; regression analysis). Also, the 
reproducibilities of the peptide assays were found to be high 
when we compared duplicate samples run on different occa- 
sions (Fig. 2a, P < 0.001, and Fig. 3a, P < 0.001; regression 
analysis). 


1-38 51-68 101-1 18 151-168 



Residue of the HCV core/matrix proteins 

FIG. 4. Immunodominant regions within the HCV core and 
matrix proteins with respect to the hydropathic character plotted by 
the method by Kyte and Dolittle (14). 
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Hie hydropathic character of the HCV core-matrix peptide 
sequence. The hydropathic character of the HCV core- 
matrix peptide amino acid sequence (14) is plotted in Fig. 4 
in relation to the immunodominant regions. All regions 
contained hydrophilic peaks. 

DISCUSSION 

Using short synthetic peptides, we identified three immu- 
nodominant regions within the HCV core protein, all of 
which were hydrophilic. Our data suggest that the major 
region at residues 1 to 38 contains a minimum of three 
distinct binding sites, whereas the region at residues 51 to 78 
might contain two distinct binding sites, and the region at 
residues 101 to 118 contains only one distinct binding site. 
The putative matrix protein or carboxy-terminal half of the 
core protein seems to contain fewer immunogenic regions, 
with the major linear site located within residues 151 to 168. 
The immun odominant regions within the HCV core protein 
are easily reproduced by short synthetic peptides, suggesting 
that the epitope regions contain linear determinants. This 
suggestion corroborates the finding of Nasoff et al. (17), who 
were able to detect reactivities to the HCV core protein by 
using short recombinant constructs of the N-terminal portion 
of the HCV core peptide. This is different from what has 
been shown for some other viral proteins, such as the 
hepatitis B virus core antigen, in which the main recognition 
sites have been shown to be discontinuous (7, 20). 

Four of the 48 initially anti-HCV-negative serum samples 
showed reproducible and inhibitable reactivities to some of 
the HCV core peptides, and at a later stage, two of the serum 
samples were confirmed to be anti-HCV positive. The other 
two serum samples with a low level of reactivity to single 
peptides were likely unspecific. 

The amino acid variations within the identified immuno- 
dominant regions did not have a large impact on immuno- 
logical recognition, and therefore, it can be assumed that 
antibodies to determinants within the HCV core are mainly 
cross-reactive between the various strains of HCV. 

The combination of two HCV core peptides gave a sensi- 
tivity of 92% in detecting anti-HCV-positive sera, as deter- 
mined by a combination of anti-HCV assays containing 
recombinant structural proteins, nonstructural proteins, or 
both. In two other series of serum samples, one series from 
blood donors and one series from immunosuppressed pa- 
tients, we isolated seroreactivities to single HCV core pep- 
tides covering the regions at residues 21 to 38, 31 to 48, and 
51 to 68, which suggests an increased sensitivity when these 
peptides are included (unpublished data). 

The inclusion of the HCV core peptide in new commercial 
kits was a necessary step, since studies have shown that the 
assay with the recombinant C-100-3 gives a high rate of 
nonspecific reactions in blood donor populations (16) and 
also that anti-C-100-3-negative donors may transmit HCV 
(9). Also, inclusion of the core and NS3 proteins shortens the 
time for seroconversion in the serodiagnosis of HCV (15). 

However, we found that about 4 to 8% of HCV-infected 
people do not have antibodies to the HCV core protein. This 
is similar to the results of Chiba et al. (3), who found that 
5.2% of the 58 serum samples from persons with clinically 
well defined chronic non-A, non-B viral hepatitis did not 
react with a recombinant core protein. 

In conclusion, we showed that short synthetic peptides 
mimic the antigenic regions found on the HCV core protein. 
Thus, the defined immunodominant regions should be fur- 
ther evaluated regarding their potential as tools in character- 


izing the immune response to the HCV core protein during 
infection. 
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ABSTRACT We previously sequenced the 5' noncoding 
region of 44 isolates of hepatitis C virus (HCV), as well as the 
envelope 1 (El) gene of 51 HCV isolates, and provided evidence 
for the existence of at least 6 mayor genetic groups consisting of 
at least 12 minor genotypes of HCV (i.e., genotypes I/la, 
II/lb, III /2a, IV/2b, 2c, V/3a, 4a-4d, 5a, and 6a). We now 
report the complete nucleotide sequence of the putative core (C) 
gene of 52 HCV isolates that represent all of these 12 genotypes 
as well as two additional genotypes provisionally designated 4e 
and 4f that we identified in this study. The phylogenetic 
analysis of the C gene sequences was in agreement with that of 
the El gene sequences. A major division in the genetic distance 
was observed between HCV isolates of genotype 2 and those of 
the other genotypes in analysis of both the El and C genes. The 
C gene sequences of 9 genotypes have not been reported 
previously (i.e., genotypes 2c, 4a-4f, 5a, and 6a). Our analysis 
indicates that the C gene-based methods currently used to 
determine the HCV genotype, such as PCR with genotype- 
specific primers, should be revised in light of these data. We 
found that the predicted C gene was exactly 573 nt long in all 
52 HCV isolates, with an N-terminal start codon and no 
in-frame stop codons. The nucleotide and predicted amino acid 
identities of the C gene sequences were in the range of 79.4- 
99.0% and 85.3-100%, respectively. Furthermore, we 
mapped universally conserved, as well as genotype-specific, 
nucleotide and deduced amino acid sequences of the C gene. 
The predicted C proteins of the different HCV genotypes 
shared the following features: (i) high content of proline 
residues, (u) high content of arginine and lysine residues 
located primarily in three domains with 10 such residues 
invariant at positions 39-62, (Hi) a cluster of 5 conserved 
tryptophan residues, (iv) two nuclear localization signals and a 
DNA-binding motif, (v) a potential phosphorylation site with a 
serine-proline motif, and (vi) three conserved hydrophilic 
domains that have been shown by others to contain immuno- 
genic epitopes. Thus, we have extended analysis of the pre- 
dicted C protein of HCV to all of the recognized genotypes, 
confirmed the existence of highly conserved regions of this 
important structural protein, and demonstrated that the ge- 
netic relatedness of HCV isolates is equivalent when analyzing 
the most conserved (i.e., C) and the most variable (i.e., El) 
genes of the HCV genome. 


Hepatitis C virus (HCV) is an important human pathogen that 
can cause acute and chronic hepatitis, liver cirrhosis, and, 
possibly, hepatocellular carcinoma. The virus particles con- 
tain a positive polarity, single-stranded RNA genome with 5' 
and 3' noncoding (NC) regions. The core (C), envelope 1 
(El), and envelope 2 proteins are encoded at the 5' terminus 
and the nonstructural proteins are encoded at the 3' terminus 
of the single open reading frame of the genome (1,2). It is now 


well established that there are a number of different geno- 
types of HCV, which may have important implications for 
pathogenesis, diagnosis, and vaccine development. Based on 
analysis of HCV isolates sequenced in their entirety, Oka- 
moto et ai (3) demonstrated that all previously published 
sequences could be grouped into four such genotypes [i.e., 
genotypes I/la (2), II/lb (4), III/2a (5), and IV/2b (3)]. At 
approximately the same time, our analysis of the highly 
conserved 5' NC region of 44 HCV isolates from around the 
world suggested the existence of additional genetic groups of 
HCV (6). Our subsequent analysis of the highly variable El 
gene from 51 HCV isolates (7) confirmed the presence of 12 
genotypes divided into at least six major genetic groups and 
accompanying subgroups (i.e., genotypes I/la, II/lb, III/2a, 
IV/2b, 2c, V/3a, 4a-4d, 5a, and 6a). Genotype V/3a isolates 
have also been found by others (8-12). Subsequently, Sim- 
monds et ai (13) confirmed the existence of multiple major 
genetic groups of HCV by sequence analysis of a short region 
within the NS-5 gene. However, until definitive overlapping 
sequences are available, the genetic relatedness of HCV 
isolates designated genotypes 4, 5, and 6 by Simmonds and 
coworkers to our isolates of genotypes 4a-4d, 5a, and 6a 
cannot be determined. Taken together, it is clear that there 
are at least 12 genotypes of HCV. 

In this report, we have designated the various genotypes by 
the nomenclature proposed by Okamoto et ai (3) and by 
Chan et ai (9). However, as we have pointed out (7), the 
proposed classification schemes should be considered pro- 
visional until more data are obtained. In the present study, we 
have determined the complete nucleotide sequence of the C 
gene in 52 HCV isolates* that represent the 12 recognized 
HCV genotypes as well as two additional genotypes, 4e and 
4f, identified in this study. 

MATERIALS AND METHODS 

Sera analyzed in this study were from 52 individuals from 12 
countries, who were positive for antibodies to HCV (anti- 
HCV) by a first-generation test (14). The consensus 5' NC and 
El gene sequences of the HCV RNA from these sera were 
previously analyzed (6, 7). In this study we have analyzed the 
consensus C gene sequence of these same HCV isolates. The 
procedures that were used for viral RNA extraction, cDNA 
synthesis, and nested PCR have been described (14). For the 
cDNA PCR assay, we used HCV-specific synthetic oligonu- 
cleotides deduced from previously determined sequences that 
flank the C gene (6, 7). In 51 of the 52 HCV isolates studied, 
we amplified the entire C gene and adjacent 5' NC and El 
sequences. However, in isolate Z7 the 36 nt at the 3' end of C 
were from a second DNA fragment that we obtained previ- 


Abbreviations: HCV, hepatitis C virus; NC, noncoding; C, core; El, 
envelope 1. 

The sequences reported in this paper have been deposited in the 
GenBank data base (accession nos. U 10 189-U 10240). 
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ously (7). Amplified DNA was purified by gel electrophoresis 
followed by glass-milk extraction (7) or electroelution and both 
strands were sequenced directly. In 44 of the 52 HCV isolates 
studied, we used the procedures for direct sequencing de- 
scribed previously (7). For a number of the HCV isolates, 
confirmatory sequencing was performed with the Applied 
Biosystems automated DNA sequencer (model 373A) and 8 
HCV isolates of genotype I/la or Il/lb were sequenced 
exclusively by this method. Multiple sequence alignments 
were performed with the program gen align (15). Phyloge- 
netic trees were constructed by the unweighted pair-group 
method with arithmetic mean that is based on the assumption 
of a constant rate of evolution (16). 

RESULTS AND DISCUSSION 

In this study, we successfully reverse-transcribed and, by 
PCR, amplified the entire C gene from HCV isolates repre- 
senting the 12 genotypes identified by analysis of El se- 
quences (7) and from all of the HCV isolates with unique 5' 
NC sequences (6) that we could not amplify in the previous 
study of El. All 73 negative control samples interspersed 
among the test samples were negative for HCV RNA. The 
amplified DNA fragment obtained in 50 of the 52 HCV 
isolates was specifically designed to overlap with our previ- 
ously obtained 5' NC-C sequences (6) and C-El sequences (7) 
at =»80 nt positions each. A complete match was observed in 
6033 of 6035 overlapping nt. Two discrepancies were ob- 
served in isolate US6 at nt 552 (C and T) and 561 (C and T). 
This may have been due to microheterogeneity at these 



Calculated genetic distance 


nucleotide positions, since the remaining overlapping se- 
quence was unique for isolate US6. In addition, there were 
three confirmed instances of microheterogeneity: nt 33 in 
isolate SA11 (C,T, and T), nt 36 in isolate S45 (A,C, and A), 
and nt 552 in isolate P10 (C,T, and T). Overall, the excellent 
agreement in these overlapping sequences in this study with 
that of the two previous studies definitively ruled out con- 
tamination as a source of nonauthentic HCV sequences. 
Furthermore, this analysis proved that the sequences ob- 
tained were from a single population and not from different 
populations as could happen in mixed infections. 

Analysis of the Nucleotide Sequence of the C Gene. We now 
report the nucleotide (nt 1-573) and deduced amino acid (aa 
1-191) sequences of the putative C gene of 52 HCV isolates. 
Relative to the prototype sequence (1,2), we found that the 
C gene was exactly 573 nt long in all 52 HCV isolates with an 
N-terminal start codon and no in-frame stop codons. Micro- 
heterogeneity, defined previously (7), was observed in 26 of 
the 52 HCV isolates at 0.2-1. 4% of the 573 nucleotide 
positions of the C gene and resulted in changes in 0.5-1 .0% 
of the 191 predicted aa in 12 of these isolates. We performed 
a multiple sequence alignment (data not shown) and found 
that the nucleotide identities of the C gene among these HCV 
isolates were in the range 79.4-99.0%. Since we were inter- 
ested in comparing the genetic relatedness of HCV isolates in 
different gene regions we constructed phylogenetic trees of 
the C gene of all 52 HCV isolates from this study and the El 
gene of 51 HCV isolates from our previous study (7) using the 
unweighted pair-group method with arithmetic mean (16) 
(Fig. 1). In both dendrograms we observed a division of the 
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Fig. 1. Phylogenetic trees showing calculated evolutionary relationships of the different HCV isolates based on the C gene sequence of 52 
HCV isolates and the El gene sequence of 51 HCV isolates. Phylogenetic trees were constructed by the unweighted pair-group method with 
arithmetic mean (16) using the computer software package Gene Works from IntelliGenetics. Lengths of the horizontal lines connecting the 
sequences, given in absolute values from 0 to 1, are proportional to the estimated genetic distances between the sequences. Genotype 
designations of HCV isolates are indicated. In 45 HCV isolates, we determined both the C and the El gene sequences. 
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45 HCV isolates that were common to the two studies into at 
least six major genetic groups (genotypes 1-6) and 12 minor 
genetic groups (genotypes I/la, II/lb, III/2a, IV/2b, 2c, 
V/3a, 4a-4d, 5a, and 6a). It is noteworthy that we observed 
a major division in genetic distance between HCV isolates of 
genotype 2 and those of the other genotypes in the phyloge- 
netic analyses of both gene sequences. Furthermore, the 
divergence of the minor genotypes within genotype 2 exhibits 
a degree of heterogeneity that is equivalent to that observed 
among the major genotypes. Analysis of the C gene from 
isolates Z5 and Z8, which had a unique 5' NC sequence (6) 
but from which we could not amplify the El gene, revealed 
that these isolates represented two additional genotypes. We 


are provisionally assigning designations 4e and 4f to these 
genotypes that have not been described previously. Although 
Simmonds et al. (17) have published partial C gene sequences 
(i.e., nt 29-269) of HCV isolates that appear to be most 
closely related to our isolates of mqjor genotype 4, final 
classification of these isolates must await completion of the 
gene sequence. Unfortunately, a sequence motif within the C 
gene (i.e., nt 186-221) that has been suggested to be predic- 
tive of genotype (8) does not reflect the genotype divisions 
observed by our analysis of the complete C gene. Overall, we 
have demonstrated that the genetic relatedness of HCV 
isolates is equivalent when analyzing the most conserved 
gene (C) and one of the most variable genes (El) of the HC V 
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AC-AC-C- 
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-C- -CATCACcAtT- 
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-G- -CCTGACTGTT- 
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CGCtT-aGCc 

AGctT-CGCt 

gGtcT-tGCT 

aGTGT-TGCg 

AGTTT-CGCC 

aGCAG-TAGT 

AGCGT-GGCT 

AGCAT-TGCC 

CaCCT-GGCC 

-CGCTT-GGCC 

GGCAT-CGCT 

-AGCGT-TGCT 

-ggcct-tGCa 

-AGCTT-GGCT 


Fig. 2. Alignment of the consensus sequence of the C gene of the different genotypes of HCV. Consensus sequence of the C gene from all 
52 HCV isolates studied is shown at the top. Furthermore, a consensus sequence of the C gene was obtained for genotypes I/la, II/lb, HI/2a, 
IV/2b, 3a, and 5a. The sequence of genotype 4c is represented by isolate Z6. Genotypes 4a, 4b, 4d, 4e, 4f, and 6a each contained only a single 
isolate. The exact HCV isolates representing the different genotypes can be seen in the phylogenetic tree of the C gene sequences in Fig. 1. 
Invariant nucleotides within a consensus sequence are capitalized and variable nucleotides are shown in lowercase letters. However, nucleotides 
that were invariant among all 52 HCV isolates are shown as dashes in the alignment. In the 14 nt positions where no consensus sequence was 
obtained, we show the nucleotide that differed from that of the other genotypes. 



8242 Biochemistry: Bukh et al. 


Proc . Natl Acad. Set. USA 91 (1994) 


genome, providing strong evidence for the suggested division 
into major and minor genotypes. 

To study further the heterogeneity of the C gene, we 
obtained the consensus sequence of this gene from the 52 
HCV isolates (Fig. 2). We found that a total of 335 (58.5%) of 
the 573 nucleotides of the C gene were invariant among these 
HCV isolates. Nucleotides at the first and second codon 
positions were invariant at 70.7% and 81.7% of these posi- 
tions, respectively, while nucleotides at the third position 
were invariant at only 23.0% of such positions. Stretches of 
6 or more invariant nt were observed from nt 1-8, 22-27, 
85-92, 110-125, 131-141, 334-340, 364-371, 397-404, and 
511-516 and may be suitable for anchoring primers for 
amplification of HCV RNA in cDNA PCR assays. Finally, we 
documented the genotype-specific sequences within the C 
gene by aligning the consensus sequences of all 14 genotypes 
(Fig. 2). Although the full-length sequence of the C gene of 
isolates representing genotypes I/la, II/ lb, III/2a, IV/2b, 
and V/3a have been reported by others (2-5, 11), those of 9 
of the 14 genotypes (i.e. , 2c, 4a-4f, 5a, and 6a) have not been 
reported previously. Overall, we have mapped universally 
conserved sequences as well as genotype-specific sequences 
of the C gene among 14 genotypes of HCV. 

Analysis of the Deduced Amino Acid Sequence of the C Gene. 
To study the heterogeneity of the C protein, we performed a 
multiple sequence alignment of the predicted amino acids for 
all 52 HCV isolates (data not shown) and obtained a consen- 
sus sequence (Fig. 3). The identities of the predicted 191 aa 
of the C protein among these HCV isolates were in the range 
85.3-100.0%. A total of 132 (69.1%) of the 191 aa of the C 
protein were invariant. The most prevalent amino acids in the 
consensus sequence were glycine (13.6%), arginine (12.6%), 
proline (11.0%), and leucine (9.9%). The most conserved 
amino acids were tryptophan (5 of 5 aa invariant), aspartic 
acid (5 of 5 aa invariant), proline (19 of 21 aa invariant), and 
glycine (23 of 26 aa invariant). Previous analyses indicated 
that HCV is evolutionarily related to pesti viruses (15). In this 
regard, it is of interest to note that the C proteins of both 
viruses have a high content of proline residues (18), which are 
likely to be important in maintaining the structure of this 
protein. As is characteristic for a protein that binds to nucleic 
acid, we found that the C protein has conserved amino acids 
that are basic and positively charged, and these are capable 
of neutralizing the negative charge of the HCV RNA encap- 
sidated by this protein (19). Specifically, >16% of the amino 
acids in the consensus sequence of the C protein of HCV are 
arginine and lysine that are located primarily in three clusters 
(i.e., from aa 6-23 , 39-74, and 101-121) (20) (Fig. 3). The 10 
arginine and lysine residues within aa 39-62 are invariant 
among all 52 HCV isolates, suggesting that this domain may 


represent an important RNA- binding site. The capsid pro- 
teins of the related flavi- and pestiviruses (15) also have a high 
content of arginine and lysine (18, 19). Although there are 
three major hydrophilic regions (i.e., aa 2-23, 39-74, and 
101-121) that are conserved in all 52 HCV isolates, the 
remainder of the C protein is hydrophobic. Interestingly, one 
such highly conserved hydrophobic domain at aa 24-39 is 
flanked by proline residues. The hydrophobic domains are 
likely to be involved in protein-protein and/or protein-RNA 
interactions during assembly of the nucleocapsid as well as in 
interaction with the lipoprotein envelope, as has been sug- 
gested for flaviviruses (19)i Other significant observations are 
(0 a cluster of 5 invariant tryptophan residues at aa 76-107; 
(«) the lack of an N-linked glycosylation site (NXT/S); (i/i) 
two potential nuclear localization signals (i.e. , PRRGPR at aa 
38-43 and PRGRRQP at aa 58-64) that are present in all 52 
HCV isolates (20); and (iv) a putative DNA-binding motif 
SPRG at aa 99-102, found in 51 of the 52 HCV isolates, with 
SP present in all 52 isolates. Our finding of conserved nuclear 
localization signals and a DNA-binding motif adds support to 
the hypothesis that the C protein of HCV might also flinction 
as a gene-regulatory protein (20). Furthermore, it has been 
suggested that the HCV C protein is posttranslationally 
modified through phosphorylation (20, 21). Interestingly, we 
found that the C protein of ail 52 HCV isolates contained a SP 
motif that was recently demonstrated to be essential for C 
protein phosphorylation in hepadnaviruses (22). Our study 
demonstrates that the C protein has features that are highly 
conserved among the various genotypes of HCV and that are 
known to be characteristic of capsid proteins of other related 
viruses. 

To study the heterogeneity of the C protein of different 
genotypes, we obtained the consensus sequence of the pro- 
tein for all isolates comprising the 14 HCV genotypes (Fig. 3). 
We mapped the genotype-specific sequences within the C 
protein by then aligning these consensus sequences (Fig. 3). 
It should be noted that phylogenetic analysis of the amino 
acid sequence of the C proteins was not capable of resolving 
the minor groups within genotypes 1 and 4 because of the 
conservation of this protein (data not shown). Overall, we 
identified only a few type-specific amino acids (Fig. 3). One 
striking example was that isolates of genotype 4 have an 
additional methionine at position 20 that is specific for this 
major genetic group. Finally, we analyzed the conservation 
of the sequences surrounding the cleavage site between the 
C and the El proteins of the different genotypes, which has 
been determined to be between aa 191 (alanine) and 192 
(tyrosine) in HCV isolates of genotype 1 (1). We previously 
found that the N-terminal amino acids of El were variable 
even within genotype 1 (7). In this study, we find that the 
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Fig. 3. Alignment of consen- 
sus sequence of deduced amino 
acid sequences of the C gene of the 
different genotypes of HCV. Con- 
sensus sequence of the C protein 
from all 52 HCV isolates studied is 
shown at the top. In the 2 aa 
positions where no consensus se- 
quence was obtained, we show the 
amino acid that differed from that 
of other genotypes. See also leg- 
end to Fig. 2. 
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C-terminal sequence of C is SA in all but 1 of the 48 HCV 
isolates comprising genotypes 1, 2, 4, 5, and 6. However, all 
4 HCV isolates of genotype 3 in this study, as well as isolates 
of genotype 3 published previously (11, 12), contain AS at this 
position. Thus, studies will be needed to determine the C/El 
cleavage site in genotype 3 isolates. Overall, we have mapped 
universally conserved sequences, as well as genotype- 
specific sequences, of the C protein among 14 genotypes of 
HCV. 

Detection of antibodies directed against the HCV core 
protein is important in diagnosis of HCV infection. The 
recombinant C22-3 protein, spanning aa 2-120 of the C gene, 
is a major component of the commercially available second- 
generation anti-HCV tests. Several studies have indicated 
that the three major hydrophilic regions of the C protein 
contain linear immunogenic epitopes (summarized in ref. 23). 
For example, antibodies against synthetic peptides from aa 
1-18, 51-68, and 101-118 were detected in infected patients 
(23). Our study demonstrates that, while these immunogenic 
regions are highly conserved, genotype-specific differences 
are observed at several amino acid positions that may influ- 
ence the specificity and sensitivity of the serological tests 
(Fig. 3). One such example is that a single substitution at aa 
110 has been demonstrated to affect seroreactivity (23). 
Despite the high degree of conservation in the immunodom- 
inant regions of the C protein among the different genotypes, 
it is possible that genetic heterogeneity of the C protein could 
lead to false-negative results in current serological tests. 

Methods for Genotype Analysis, Several methods have been 
used to determine the genotype of HCV isolates without 
resorting to sequence analysis. These include PCR followed 
by (/) amplification with type-specific primers (24); («) de- 
termination of restriction-length polymorphism (17); and (iii) 
specific hybridization (25). The proposed methods have 
primarily been based on 5' NC and C sequences. Our 
previous studies suggested that 5 ' NC-based genotyping 
systems would be predictive of only the major genetic groups 
of HCV (6, 7). The most widely used C-based genotype 
system has been the PCR assay with type-specific primers 
that was designed for distinguishing HCV isolates of geno- 
types I/la, Il/lb, III/2a, IV/2b, and V/3a (11, 24). Since this 
system was developed before identification of genotypes 2c, 
4a-4f, 5a, and 6a, there are significant limitations to this 
typing system. For example, the primers specific for geno- 
type IV/2b (nt 270-251) are as highly conserved within our 
isolates of genotypes 4c and 6a as within the isolates of 
genotype IV/2b. Thus, this assay probably cannot distin- 
guish among these genotypes. Another C-based approach 
involves distinguishing between genotypes 1 and 2 by type- 
specific antibody responses (26). Synthetic peptides com- 
posed of aa 65-81 were found to be genotype-specific for 
genotypes 1 and 2 in ELISAs. Our analysis of amino acid 
sequences demonstrated significant variation within isolates 
of genotypes 1 and 2. Thus, it is likely that these peptides will 
not identify all isolates of genotypes 1 and 2. Furthermore, 
the peptide for genotype 1 was highly conserved within 
isolates of genotypes 3 and 4 (Fig. 3) and might detect 
antibodies against these genotypes as well. It should be 
pointed out that most isolates of genotypes 3 and 4 had an 
identical amino acid sequence at positions 65-81. Overall, 
the proposed C-based genotyping systems should be revised 
in light of the C gene sequence data presented here, and a 
more definitive approach such as sequence analysis of gene 
regions that are predictive of genotype may be necessary for 
a definitive determination. 

Conclusion. The genetic relatedness of HCV isolates is 
equivalent when analyzing the most conserved (i.e., C) and 
the most variable (i.e., El) genes. The results of this study 
have implications for the taxonomy of HCV and for the 
diagnosis, prevention, and therapy of HCV infections. 
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